Incident Report: Persisting Stuck Collector Alerts on Mantle and Blast
Date: 2024-03-15
Time: 22:10 (UTC+5:30)
Duration: 4 days 18 hours
Description
A stuck collector was detected on Mantle and Blast, indicating a significant lag in data collection.
Root Cause
These chains are seemingly generating too many blocks/not something the system was designed the handle. It's a problem of resource constraint.
Impact
The data collection for Mantle and Blast was delayed, causing a lag in the current block and the last queried block.
Timeline
- 22:10 (Mar 15) - First noticed the issue.
- 04:36 (Mar 16) - Initial diagnosis.
- 04:39 (Mar 16) - Started the fix.
- 18:29 (Mar 20) - Issue resolved.
Lessons Learned
The incident highlighted the need for flexible data collection methods and the importance of having fallback mechanisms in place for RPC issues.
Actions Taken
- Initially, Aaron doubled the resources to face the resource constraint hoping to catch up again over in few hours.
- Later, gas price strategies were updated for production and Vekil set the collectors for Mantle, Base, and Blast to be up to date.
Related Images/Logs
- Slack escalation link.
Incident Reviewer(s)
- Arda
- Aaron
- Andrew
- Abdel
- Vekil